Convolutional LSTM Networks for Video-based Person Re-identification
نویسندگان
چکیده
In this paper, we present an end-to-end approach to simultaneously learn spatio-temporal features and corresponding similarity metric for video-based person re-identification. Given the video sequence of a person, features from each frame that are extracted from all levels of a deep convolutional network can preserve a higher spatial resolution from which we can model finer motion patterns. These lowlevel visual percepts are leveraged into a variant of recurrent model to characterize the temporal variation between time-steps. Features from all time-steps are then summarized using temporal pooling to produce an overall feature representation for the complete sequence. The deep convolutional network, recurrent layer, and the temporal pooling are jointly trained to extract comparable hidden-unit representations from input pair of time series to compute their corresponding similarity value. The proposed framework combines time series modeling and metric learning to jointly learn relevant features and a good similarity measure between time sequences of person. Experiments demonstrate that our approach achieves the state-of-the-art performance for video-based person re-identification on iLIDS-VID and PRID 2011, the two primary public datasets for this purpose.
منابع مشابه
Learning Compact Appearance Representation for Video-based Person Re-Identification
This paper presents a novel approach for video-based person re-identification using multiple Convolutional Neural Networks (CNNs). Unlike previous work, we intend to extract a compact yet discriminative appearance representation from several frames rather than the whole sequence. Specifically, given a video, the representative frames are selected based on the walking profile of consecutive fram...
متن کاملThree-Stream Convolutional Networks for Video-based Person Re-Identification
This paper aims to develop a new architecture that can make full use of the feature maps of convolutional networks. To this end, we study a number of methods for video-based person re-identification and make the following findings: 1) Max-pooling only focuses on the maximum value of a receptive field, wasting a lot of information. 2) Networks with different streams even including the one with t...
متن کاملCross Domain Knowledge Transfer for Person Re-identification
Person Re-Identification (re-id) is a challenging task in computer vision, especially when there are limited training data from multiple camera views. In this paper, we propose a deep learning based person re-identification method by transferring knowledge of mid-level attribute features and high-level classification features. Building on the idea that identity classification, attribute recogni...
متن کاملScript Identification in Natural Scene Image and Video Frame using Attention based Convolutional-LSTM Network
Script identification plays a significant role in analysing documents and videos. In this paper, we focus on the problem of script identification in scene text images and video scripts. Because of low image quality, complex background and similar layout of characters shared by some scripts like Greek, Latin, etc., text recognition in those cases become challenging. Most of the recent approaches...
متن کاملPerson Re-Identification by Localizing Discriminative Regions
Person re-identification is a challenging task of matching a person’s image across multiple images captured from different camera views. Recently, deep learning based approaches have been proposed that show promising performance on this task. However, most of these approaches use whole image features to compute the similarity between images. This is not very intuitive since not all the regions ...
متن کامل